Covid-19 datos

This lab is adapted from “Human Genome Analysis Lab 9 : Working with COVID-19 reporting data by Jeffrey Blanchard”

Objetivos

-Access COVID-19 data remotely -Understand report -Create graphs and maps to visualize COVID-19 data

Para instalar las librerias usted debe copiad: install.packages(“package_name”), una vez instaladas usted podra cargarlas a su ambiente en R

Recientemente se ha establecido el nombre SARS-Cov-2 (severe acute respiratory syndrome coronavirus 2) para el virus basado en diversos analusis filogeneticos. La enfermedad causada por el virus es conocida como COVID-19. En este laboratorio se trabajaran con los casos diaramente actualizados de COVID-19 (confirmados, muertes y recuperaciones) de la base de datos de John Hopkins University. Los investigadores Ensheng Dong, Hongru Du, Lauren Gardner desarrollaron una plataforma para monitorear los reportes de COVID-19 en tiempo real.

Los datos colectados pertenecen a las siguientes instituciones: World Health Organization (WHO) | DXY.cn. Pneumonia. 2020 | BNO News | National Health Commission of the People’s Republic of China (NHC) | China CDC (CCDC) | Hong Kong Department of Health | Macau Government | Taiwan CDC | US CDC | Government of Canada | Australia Government Department of Health | European Centre for Disease Prevention and Control (ECDC) | Ministry of Health Singapore (MOH) | Italy Ministry of Health | 1Point3Arces | WorldoMeters

Es importante entender que los datos reportados estan supeditados a los test para COVID-19, por ende, en muchos paises se pueden estar ignorando varios casos dado a la falta de test para el virus. Aun asi, los casos diarios siguen creciendo de manera exponencial incrementando la curva de infeccion. Por lo cual la idea del laboratorio es que compartan esta informacion con amigos y familiares y se pueda entender la importancia del aislamiento preventivo al igual que las precauciones para evitar contagios y reducir la curva de infeccion. #SocialDistancing #COVID-19 #CuarentenaPreventiva

John Hopkins University Github reports

Plataforma Github para reportes COVID-19

Archivos a usar en el siguiente laboratorio

Reportes casos confirmados, muertes y recuperaciones

csse_covid_19_data | csse_covid_19_daily_reports | 03-24-2020.csv Los archivos contienen varias columnas con la informacion de: Province/State Country/Region Last Update Confirmed Deaths Recovered Latitude Longitude

Reportes casos a traves del tiempo

csse_covid_19_data | csse_covid_19_time_series | time_series_covid19_confirmed_global.csv Los archivos contienen columnas con la sguiente informacion: Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20…

Graficas y reportes diarios

A continuacion seusaran las librerias tidyverse y lubridate para reformar los datos Siga los siguientes pasos 1. Vaya al archivo que quiere descargar 2. En la parte superior derecha haga click en le boton “Raw” 3. Dele guardar como o copie el url

library(tidyverse)
library(maps)
library(mapdata)
library(lubridate)
library(viridis)
library(wesanderson)

Crearemos un obj llamado report_03_24_2020 Reportes Marzo 24/2020

report_03_24_2020 <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-24-2020.csv")) %>%
  select(-FIPS, -Admin2)
## Parsed with column specification:
## cols(
##   FIPS = col_character(),
##   Admin2 = col_character(),
##   Province_State = col_character(),
##   Country_Region = col_character(),
##   Last_Update = col_datetime(format = ""),
##   Lat = col_double(),
##   Long_ = col_double(),
##   Confirmed = col_double(),
##   Deaths = col_double(),
##   Recovered = col_double(),
##   Active = col_double(),
##   Combined_Key = col_character()
## )
head(report_03_24_2020)
## # A tibble: 6 x 10
##   Province_State Country_Region Last_Update           Lat  Long_ Confirmed
##   <chr>          <chr>          <dttm>              <dbl>  <dbl>     <dbl>
## 1 South Carolina US             2020-03-24 23:37:31  34.2  -82.5         1
## 2 Louisiana      US             2020-03-24 23:37:31  30.3  -92.4         2
## 3 Virginia       US             2020-03-24 23:37:31  37.8  -75.6         1
## 4 Idaho          US             2020-03-24 23:37:31  43.5 -116.         19
## 5 Iowa           US             2020-03-24 23:37:31  41.3  -94.5         1
## 6 Kentucky       US             2020-03-24 23:37:31  37.1  -85.3         0
## # … with 4 more variables: Deaths <dbl>, Recovered <dbl>, Active <dbl>,
## #   Combined_Key <chr>
str(report_03_24_2020)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 3417 obs. of  10 variables:
##  $ Province_State: chr  "South Carolina" "Louisiana" "Virginia" "Idaho" ...
##  $ Country_Region: chr  "US" "US" "US" "US" ...
##  $ Last_Update   : POSIXct, format: "2020-03-24 23:37:31" "2020-03-24 23:37:31" ...
##  $ Lat           : num  34.2 30.3 37.8 43.5 41.3 ...
##  $ Long_         : num  -82.5 -92.4 -75.6 -116.2 -94.5 ...
##  $ Confirmed     : num  1 2 1 19 1 0 1 0 25 0 ...
##  $ Deaths        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Recovered     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Active        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Combined_Key  : chr  "Abbeville, South Carolina, US" "Acadia, Louisiana, US" "Accomack, Virginia, US" "Ada, Idaho, US" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   FIPS = col_character(),
##   ..   Admin2 = col_character(),
##   ..   Province_State = col_character(),
##   ..   Country_Region = col_character(),
##   ..   Last_Update = col_datetime(format = ""),
##   ..   Lat = col_double(),
##   ..   Long_ = col_double(),
##   ..   Confirmed = col_double(),
##   ..   Deaths = col_double(),
##   ..   Recovered = col_double(),
##   ..   Active = col_double(),
##   ..   Combined_Key = col_character()
##   .. )

Reportes Marzo 11/2020 En algunos reportes no tan recientes, los formatos son diferentes, en este caso tendremos que cambiar los nombres de las columnas Province/State y Country/Region

report_03_11_2020 <-   read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-11-2020.csv")) %>% 
  rename(Country_Region = "Country/Region", Province_State = "Province/State")
## Parsed with column specification:
## cols(
##   `Province/State` = col_character(),
##   `Country/Region` = col_character(),
##   `Last Update` = col_datetime(format = ""),
##   Confirmed = col_double(),
##   Deaths = col_double(),
##   Recovered = col_double(),
##   Latitude = col_double(),
##   Longitude = col_double()
## )
head(report_03_11_2020)
## # A tibble: 6 x 8
##   Province_State Country_Region `Last Update`       Confirmed Deaths Recovered
##   <chr>          <chr>          <dttm>                  <dbl>  <dbl>     <dbl>
## 1 Hubei          China          2020-03-11 10:53:02     67773   3046     49134
## 2 <NA>           Italy          2020-03-11 21:33:02     12462    827      1045
## 3 <NA>           Iran           2020-03-11 18:52:03      9000    354      2959
## 4 <NA>           Korea, South   2020-03-11 21:13:18      7755     60       288
## 5 France         France         2020-03-11 22:53:03      2281     48        12
## 6 <NA>           Spain          2020-03-11 20:53:02      2277     54       183
## # … with 2 more variables: Latitude <dbl>, Longitude <dbl>
str(report_03_11_2020)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 216 obs. of  8 variables:
##  $ Province_State: chr  "Hubei" NA NA NA ...
##  $ Country_Region: chr  "China" "Italy" "Iran" "Korea, South" ...
##  $ Last Update   : POSIXct, format: "2020-03-11 10:53:02" "2020-03-11 21:33:02" ...
##  $ Confirmed     : num  67773 12462 9000 7755 2281 ...
##  $ Deaths        : num  3046 827 354 60 48 ...
##  $ Recovered     : num  49134 1045 2959 288 12 ...
##  $ Latitude      : num  31 43 32 36 46.2 ...
##  $ Longitude     : num  112.27 12 53 128 2.21 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   `Province/State` = col_character(),
##   ..   `Country/Region` = col_character(),
##   ..   `Last Update` = col_datetime(format = ""),
##   ..   Confirmed = col_double(),
##   ..   Deaths = col_double(),
##   ..   Recovered = col_double(),
##   ..   Latitude = col_double(),
##   ..   Longitude = col_double()
##   .. )

Confirmed cases in US by March 11

report_03_11_2020 %>% 
  filter(Country_Region == "US") %>% 
  group_by(Province_State) %>% 
  summarise(Confirmed = sum(Confirmed)) %>% 
  ggplot(aes(x = Confirmed,  y = reorder(Province_State, Confirmed))) + 
    geom_point() +
    ggtitle("Confirmed cases for each US State March 11") +
    ylab("Country/Region") +
    xlab("Confirmed Cases")

Confirmed cases in US by March 24

report_03_24_2020 %>% 
  filter(Country_Region == "US") %>% 
  group_by(Province_State) %>% 
  summarise(Confirmed = sum(Confirmed)) %>% 
  ggplot(aes(x = Confirmed,  y = reorder(Province_State, Confirmed))) + 
    geom_point() +
    ggtitle("Confirmed cases for each US State March 24") +
    ylab("Country/Region") +
    xlab("Confirmed Cases")

Confirmed cases in China by March 24

report_03_24_2020 %>% 
  filter(Country_Region == "China") %>% 
  group_by(Province_State) %>% 
  summarise(Confirmed = sum(Confirmed)) %>% 
  ggplot(aes(x = Confirmed,  y = reorder(Province_State, Confirmed))) + 
    geom_point() +
    ggtitle("Confirmed cases for each China region March 24") +
    ylab("Country/Region") +
    xlab("Confirmed Cases")

Tenga en cuenta que paises como China y US tienen multiples observaciones (Estados/Regiones) por ende se puede realizar este tipo de grafica, otros paises como Colombia y España tiene solo un dato por una variable, es decir el numero de casos total por pais.

Si se quiere graficar el numero total de casos por pais entonces tendremos que sumar los valores por variable (fila)

report_03_24_2020 %>% 
  group_by(Country_Region) %>% 
  summarise(Deaths = sum(Deaths)) %>% 
  arrange(desc(Deaths))
## # A tibble: 169 x 2
##    Country_Region Deaths
##    <chr>           <dbl>
##  1 Italy            6820
##  2 China            3281
##  3 Spain            2808
##  4 Iran             1934
##  5 France           1102
##  6 US                706
##  7 United Kingdom    423
##  8 Netherlands       277
##  9 Germany           157
## 10 Belgium           122
## # … with 159 more rows
#Compare los valores entres los reportes del 11 y el 24 de Marzo. Que pais tiene mas muertes reportadas el 11 y que pais tiene mas muertes reportadas el 24?
#Realice la misma tarea pero con los casos confirmados y otro a parte con los casos recuperados

Reported deaths by March 24

report_03_24_2020 %>% 
  group_by(Country_Region) %>% 
  summarise(Deaths = sum(Deaths)) %>% 
  arrange(desc(Deaths)) %>% 
  slice(1:20) %>% 
  ggplot(aes(x = Deaths,  y = reorder(Country_Region, Deaths))) + 
    geom_bar(stat = 'identity') +
    ggtitle("The 20 countries with the most reported COV19-related deaths") +
    ylab("Country/Region") +
    xlab("Deaths")

Ejercicio 1

Realice como minimo dos graficas diferentes a las del ejemplo

Reportes de casos a traves del tiempo

time_series_confirmed <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")) %>%
  rename(Province_State = "Province/State", Country_Region = "Country/Region")
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   `Province/State` = col_character(),
##   `Country/Region` = col_character()
## )
## See spec(...) for full column specifications.

Revise la tabla

head(time_series_confirmed)
## # A tibble: 6 x 75
##   Province_State Country_Region   Lat   Long `1/22/20` `1/23/20` `1/24/20`
##   <chr>          <chr>          <dbl>  <dbl>     <dbl>     <dbl>     <dbl>
## 1 <NA>           Afghanistan     33    65            0         0         0
## 2 <NA>           Albania         41.2  20.2          0         0         0
## 3 <NA>           Algeria         28.0   1.66         0         0         0
## 4 <NA>           Andorra         42.5   1.52         0         0         0
## 5 <NA>           Angola         -11.2  17.9          0         0         0
## 6 <NA>           Antigua and B…  17.1 -61.8          0         0         0
## # … with 68 more variables: `1/25/20` <dbl>, `1/26/20` <dbl>, `1/27/20` <dbl>,
## #   `1/28/20` <dbl>, `1/29/20` <dbl>, `1/30/20` <dbl>, `1/31/20` <dbl>,
## #   `2/1/20` <dbl>, `2/2/20` <dbl>, `2/3/20` <dbl>, `2/4/20` <dbl>,
## #   `2/5/20` <dbl>, `2/6/20` <dbl>, `2/7/20` <dbl>, `2/8/20` <dbl>,
## #   `2/9/20` <dbl>, `2/10/20` <dbl>, `2/11/20` <dbl>, `2/12/20` <dbl>,
## #   `2/13/20` <dbl>, `2/14/20` <dbl>, `2/15/20` <dbl>, `2/16/20` <dbl>,
## #   `2/17/20` <dbl>, `2/18/20` <dbl>, `2/19/20` <dbl>, `2/20/20` <dbl>,
## #   `2/21/20` <dbl>, `2/22/20` <dbl>, `2/23/20` <dbl>, `2/24/20` <dbl>,
## #   `2/25/20` <dbl>, `2/26/20` <dbl>, `2/27/20` <dbl>, `2/28/20` <dbl>,
## #   `2/29/20` <dbl>, `3/1/20` <dbl>, `3/2/20` <dbl>, `3/3/20` <dbl>,
## #   `3/4/20` <dbl>, `3/5/20` <dbl>, `3/6/20` <dbl>, `3/7/20` <dbl>,
## #   `3/8/20` <dbl>, `3/9/20` <dbl>, `3/10/20` <dbl>, `3/11/20` <dbl>,
## #   `3/12/20` <dbl>, `3/13/20` <dbl>, `3/14/20` <dbl>, `3/15/20` <dbl>,
## #   `3/16/20` <dbl>, `3/17/20` <dbl>, `3/18/20` <dbl>, `3/19/20` <dbl>,
## #   `3/20/20` <dbl>, `3/21/20` <dbl>, `3/22/20` <dbl>, `3/23/20` <dbl>,
## #   `3/24/20` <dbl>, `3/25/20` <dbl>, `3/26/20` <dbl>, `3/27/20` <dbl>,
## #   `3/28/20` <dbl>, `3/29/20` <dbl>, `3/30/20` <dbl>, `3/31/20` <dbl>,
## #   `4/1/20` <dbl>

Aqui necesitamos trasnformar el data.frame para posteriores graficas. El formato anterior se conoce como ‘wide format’ (muchas columnas) y sera transformado a ‘long format’ (muchas filas)

time_series_confirmed_long <- time_series_confirmed %>% 
               pivot_longer(-c(Province_State, Country_Region, Lat, Long),
                            names_to = "Date", values_to = "Confirmed") 
library(maps)
library(viridis)
world <- map_data("world")
mybreaks <- c(1, 20, 100, 1000, 50000)

Casos globales confirmados by March 11, 2020

ggplot() +
  geom_polygon(data = world, aes(x=long, y = lat, group = group), fill="grey",colour = "black", alpha=0.3) +
  geom_point(data=time_series_confirmed, aes(x=Long, y=Lat, size=`3/11/20`, color=`3/11/20`),stroke=F, alpha=0.7) +
  scale_size_continuous(name="Cases", trans="log", range=c(1,7),breaks=mybreaks, labels = c("1-19", "20-99", "100-999", "1,000-49,999", "50,000+")) +
  # scale_alpha_continuous(name="Cases", trans="log", range=c(0.1, 0.9),breaks=mybreaks) +
  scale_color_viridis_c(option="inferno",name="Cases", trans="log",breaks=mybreaks, labels = c("1-19", "20-99", "100-999", "1,000-49,999", "50,000+")) +
  theme_void() + 
  guides( colour = guide_legend()) +
  labs(caption = "") +
  theme(
    legend.position = "bottom",
    text = element_text(color = "#22211d"),
    plot.background = element_rect(fill = "#ffffff", color = NA), 
    panel.background = element_rect(fill = "#ffffff", color = NA), 
    legend.background = element_rect(fill = "#ffffff", color = NA)
  )+
      ggtitle("Confirmed COVID-19 Cases March 11, 2020")
## Warning: Transformation introduced infinite values in discrete y-axis

## Warning: Transformation introduced infinite values in discrete y-axis
## Warning in sqrt(x): NaNs produced
## Warning: Removed 94 rows containing missing values (geom_point).

Casos globales confirmados para Marzo 31 2020

daily_report_03_31_2020 <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-31-2020.csv")) %>% 
  rename(Long = "Long_") %>% 
  select(-Admin2, -FIPS)
    
ggplot(daily_report_03_31_2020, aes(x = Long, y = Lat, size = Confirmed/1000)) +
    borders("world",fill = "grey90", colour=NA) + #additional option colour = "black" instead of black
    theme_bw() +
    geom_point(shape = 21, color='blue', fill='blue', alpha = 0.5) +
    labs(title = 'World COVID-19 Confirmed cases by March 31, 2020',x = '', y = '',
        size="Cases (x1000))") +
    theme(legend.position = "right") +
    coord_fixed(ratio=1.5)
## Warning: Removed 2 rows containing missing values (geom_point).

Times series confirmed COVID-19 in US per state

US March 31, 2020

mybreaksUS <- c(1, 100, 1000, 10000, 10000)
daily_report_03_31_2020 %>% 
  filter(Country_Region == "US") %>% 
  filter (!Province_State %in% c("Alaska","Hawaii", "American Samoa",
                  "Puerto Rico","Northern Mariana Islands", 
                  "Virgin Islands", "Recovered", "Guam", "Grand Princess",
                  "District of Columbia", "Diamond Princess")) %>% 
  filter(Lat > 0) %>% 
ggplot(aes(x = Long, y = Lat, size = Confirmed)) +
    borders("state", colour = "white", fill = "grey90") +
    geom_point(aes(x=Long, y=Lat, size=Confirmed, color=Confirmed),stroke=F, alpha=0.7) +
    scale_size_continuous(name="Cases", trans="log", range=c(1,7), 
                        breaks=mybreaks, labels = c("1-99",
                        "100-999", "1,000-9,999", "10,000-99,999", "50,000+")) +
    scale_color_viridis_c(option="viridis",name="Cases",
                        trans="log", breaks=mybreaks, labels = c("1-99",
                        "100-999", "1,000-9,999", "10,000-99,999", "50,000+"))  +
# Cleaning up the graph
  
  theme_void() + 
    guides( colour = guide_legend()) +
    labs(title = "COVID-19 Confirmed Cases in the US by March 31, 2020") +
    theme(
      legend.position = "bottom",
      text = element_text(color = "#22211d"),
      plot.background = element_rect(fill = "#ffffff", color = NA), 
      panel.background = element_rect(fill = "#ffffff", color = NA), 
      legend.background = element_rect(fill = "#ffffff", color = NA)
    ) +
    coord_fixed(ratio=1.5)
## Warning: Transformation introduced infinite values in discrete y-axis

## Warning: Transformation introduced infinite values in discrete y-axis
## Warning in sqrt(x): NaNs produced
## Warning: Removed 7 rows containing missing values (geom_point).

Ejercicio

Graficar el mapa global de casos confirmados para Marzo 25, 2020. Que diferencia encuentra con los mapas de casos de Marzo?

Mapas individuales

time_series_confirmed_long2 <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")) %>%
    rename(Province.State = "Province/State", Country.Region = "Country/Region") %>%
  pivot_longer(-c(Province.State, Country.Region, Lat, Long),
    names_to = "Date", values_to = "cumulative_cases") %>%
    mutate(Date = mdy(Date) - days(1),
        Place = paste(Lat,Long,sep="_")) %>%
    group_by(Place,Date) %>%
        summarise(cumulative_cases = ifelse(sum(cumulative_cases)>0,
        sum(cumulative_cases),NA_real_),
        Lat = mean(Lat),
        Long = mean(Long)) %>%
    mutate(Pandemic_day = as.numeric(Date - min(Date)))
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   `Province/State` = col_character(),
##   `Country/Region` = col_character()
## )
## See spec(...) for full column specifications.
ggplot(subset(time_series_confirmed_long2, Date %in% seq(min(Date),max(Date),7)),
            aes(x = Long, y = Lat, size = cumulative_cases/1000)) +
            borders("world", colour = NA, fill = "grey90") +
            theme_bw() +
            geom_point(shape = 21, color='purple', fill='purple', alpha = 0.5) +
            labs(title = 'COVID-19 spread',x = '', y = '',
                 size="Cases (x1000))") +
            theme(legend.position = "right") +
            coord_fixed(ratio=1)+
            facet_wrap(.~Date,nrow=3)
## Warning: Removed 1415 rows containing missing values (geom_point).

Latin America

#Latin countries
some.latin <- c("Colombia","Brazil","Peru","Ecuador","Chile","Venezuela","Bolivia","Argentina","Paraguay","Uruguay")
#retrieve map data
some.latin <- map_data("world", region = some.latin)
#Coordinates of countries
region.coord.data <- some.latin %>%
  group_by(region) %>%
  summarise(long = mean(long), lat = mean(lat))
#Coordinates of countries
region.coord.data <- some.latin %>%
  group_by(region) %>%
  summarise(long = mean(long), lat = mean(lat))
#Confirmed cases of COVID-19 in Latin America 
time_series_confirmed_latin <- time_series_confirmed %>% 
  filter (Country_Region %in% c("Colombia","Brazil","Peru","Ecuador","Chile","Venezuela","Bolivia","Argentina","Paraguay","Uruguay"))

#You can plot data of recoveries, deaths and confirmed cases separately

COVID-19 Confirmed Cases in South America reported on March 24/20

mybreaks3 <- c(1, 50, 100, 200, 300, 1000, 2000)
ggplot() +
    geom_polygon(data = some.latin, aes(x=long, y = lat, group = group), fill="grey", alpha=0.3,colour='black') + #additional option colour="white"
  geom_point(data=time_series_confirmed_latin, aes(x=Long, y=Lat, size=`3/24/20`, color=`3/24/20`),stroke=F, alpha=0.7)+
  scale_size_continuous(name="Cases", trans="log", range=c(1,20),breaks=mybreaks3, labels = c("1-49", "50-99", "100-199", "200-299", "300-399", "1000-1999",'2000+')) +
  # scale_alpha_continuous(name="Cases", trans="log", range=c(0.1, 0.9),breaks=mybreaks) +
  scale_color_viridis_c(option="inferno",name="Cases", trans="log",breaks=mybreaks3, labels = c("1-49","50-99", "100-199", "200-299", "300-399", "1000-1999",'2000+')) +
  theme_void() + 
  guides( colour = guide_legend()) +
  labs(caption = "") +
  theme(
    legend.position = "bottom",
    legend.text = element_text(size = 20),
    legend.title = element_text(size = 20),
    text = element_text(color = "#22211d"),
    plot.background = element_rect(fill = "#ffffff", color = NA), 
    panel.background = element_rect(fill = "#ffffff", color = NA), 
    legend.background = element_rect(fill = "#ffffff", color = NA))+
  geom_text(aes(x=long, y=lat,label=region), data = region.coord.data, size =5, hjust=0.5)+
   ggtitle("COVID-19 Confirmed Cases in South America reported on March 24/20")

library(ggplot2)
library(gganimate)
library(transformr)
theme_set(theme_bw())
#install.packages("sf",dependencies = T)
#install.packages("devtools")
#devtools::install_github("thomasp85/transformr")
time_series_confirmed_long <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")) %>%
  rename(Province_State = "Province/State", Country_Region = "Country/Region")  %>% 
               pivot_longer(-c(Province_State, Country_Region, Lat, Long),
                            names_to = "Date", values_to = "Confirmed") 
# Let's get the times series data for deaths
time_series_deaths_long <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv")) %>%
  rename(Province_State = "Province/State", Country_Region = "Country/Region")  %>% 
  pivot_longer(-c(Province_State, Country_Region, Lat, Long),
               names_to = "Date", values_to = "Deaths")
time_series_recovered_long <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv")) %>%
  rename(Province_State = "Province/State", Country_Region = "Country/Region") %>% 
  pivot_longer(-c(Province_State, Country_Region, Lat, Long),
               names_to = "Date", values_to = "Recovered")
# Create Keys 
time_series_confirmed_long <- time_series_confirmed_long %>% 
  unite(Key, Province_State, Country_Region, Date, sep = ".", remove = FALSE)
time_series_deaths_long <- time_series_deaths_long %>% 
  unite(Key, Province_State, Country_Region, Date, sep = ".") %>% 
  select(Key, Deaths)
time_series_recovered_long <- time_series_recovered_long %>% 
  unite(Key, Province_State, Country_Region, Date, sep = ".") %>% 
  select(Key, Recovered)
# Join tables
time_series_long_joined <- full_join(time_series_confirmed_long,
              time_series_deaths_long, by = c("Key"))
time_series_long_joined <- full_join(time_series_long_joined,
              time_series_recovered_long, by = c("Key")) %>% 
    select(-Key)
# Reformat the data
time_series_long_joined$Date <- mdy(time_series_long_joined$Date)
# Create Report table with counts
time_series_long_joined_counts <- time_series_long_joined %>% 
  pivot_longer(-c(Province_State, Country_Region, Lat, Long, Date),
               names_to = "Report_Type", values_to = "Counts")

Plot US, China,Korea, South,Japan, Spain, Italy

data_time <- time_series_long_joined %>% 
    group_by(Country_Region,Date) %>% 
    summarise_at(c("Confirmed", "Deaths", "Recovered"), sum) %>% 
    filter (Country_Region %in% c("China","Korea, South","Japan","US", "Spain", "Italy")) 
p <- ggplot(data_time, aes(x = Date,  y = Confirmed, color = Country_Region)) + 
      geom_point() +
      geom_line() +
      ggtitle("Confirmed COVID-19 Cases") +
      geom_point(aes(group = seq_along(Date))) 
p

Plot Colombia, Brazil, Chile, Venezuela, Ecuador, Mexico

data_time <- time_series_long_joined %>% 
    group_by(Country_Region,Date) %>% 
    summarise_at(c("Confirmed", "Deaths", "Recovered"), sum) %>% 
    filter (Country_Region %in% c("Colombia","Brazil","Chile", "Venezuela","Ecuador", "Mexico")) 
p <- ggplot(data_time, aes(x = Date,  y = Confirmed, color = Country_Region)) + 
      geom_point() +
      geom_line() +
      ggtitle("Confirmed COVID-19 Cases") +
      geom_point(aes(group = seq_along(Date))) 
p

animation

data_time <- time_series_long_joined %>% 
    group_by(Country_Region,Date) %>% 
    summarise_at(c("Confirmed", "Deaths", "Recovered"), sum) %>% 
    filter (Country_Region %in% c("China","Korea, South","Japan","US")) 
p <- ggplot(data_time, aes(x = Date,  y = Confirmed, color = Country_Region)) + 
      geom_point() +
      geom_line() +
      ggtitle("Confirmed COVID-19 Cases") +
      geom_point(aes(group = seq_along(Date))) +
      transition_reveal(Date) 
    
animate(p, end_pause = 15)
## Warning: No renderer available. Please install the gifski, av, or magick package
## to create animated output
covid <- read_csv(url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")) %>%
           rename(Province_State= "Province/State", Country_Region = "Country/Region") %>%
           pivot_longer(-c(Province_State, Country_Region, Lat, Long),
                  names_to = "Date", values_to = "Confirmed") %>%
           mutate(Date = mdy(Date) - days(1),
                  Place = paste(Lat,Long,sep="_")) %>%
# Summarizes state and province information
             group_by(Place,Date) %>%
           summarise(cumulative_cases = ifelse(sum(Confirmed)>0,
                     sum(Confirmed),NA_real_),
                     Lat = mean(Lat),
                     Long = mean(Long)) %>%
           mutate(Pandemic_day = as.numeric(Date - min(Date)))
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   `Province/State` = col_character(),
##   `Country/Region` = col_character()
## )
## See spec(...) for full column specifications.
world <- ggplot(covid,aes(x = Long, y = Lat, size = cumulative_cases/1000)) +
                 borders("world", colour = "gray50", fill = "grey90") +
                 theme_bw() +
                 geom_point(color='purple', alpha = .5) +
                 labs(title = 'Pandemic Day: {frame}',x = '', y = '',
                      size="Cases (x1000))") +
                 theme(legend.position = "right") +
                 coord_fixed(ratio=1.3)+
                 transition_time(Date) +
                 enter_fade()

animate(world, end_pause = 30)
## Warning: No renderer available. Please install the gifski, av, or magick package
## to create animated output